Technical Presentation 2: Policy

Ernesto Carrella

January 18, 2016

Simulating Policies

  • Open Loop
    • Scenario Evaluation
    • Policy Optimization
  • Closed Loop
    • Policy Search
    • Policy Discovery

Open Loop

Simulating Quotas

  • TAC: Total Allowable Catch
  • ITQ: Individual Tradeable Quota

How to simulate markets

  • What is the personal value of quotas?
  • How traders meet?
  • How traders bargain?

Reservation price

Buying a quota for one 1 unit of catch this season

\(\Pi\): unit profits expected; \(\lambda\) : quota price

\[ \text{Revenue from Buying}= \begin{cases} \Pi - \lambda,& \text{if quota is used} \\ -\lambda, & \text{otherwise} \end{cases} \]

\[ E[\text{Value of Quota}] = 0 \]

\[ \lambda^* = \text{Pr}(\text{Needed})\Pi \]

Estimate \(\text{Pr}(\text{Needed})\)

\[ \begin{aligned} q &= \text{Quota owned} \\ c &= \text{Daily catch} \\ t &= \text{Day of the season} \\ T &= \text{Season length} \end{aligned} \] Then the probability that you will need that unit of quota is just: \[ 1 - \text{Pr}(c \leq \frac{q}{T-t}) \]

Reservation price - day 100

Reservation price - day 360

Reservation price - multiple species

For every unit of species 1 I expect to catch \(x_2\) units of species 2

\[ \text{Revenue from Buying}= \begin{cases} \Pi_1 - \lambda_1 + x_2(\Pi_2 - \lambda_2),& \text{needed} \\ -\lambda_1, & \text{otherwise} \end{cases} \]

\[ \lambda_1^* = \text{Pr}(\text{Needed})\left(\Pi_1 + x_2(\Pi_2 - \lambda_2) \right) \]

Simulated quota prices

How quotas affect behaviour

  • Opportunity costs
  • Nothing changes in the decision algorithms

Scenario Evaluation

Setup

  • One species of fish
  • Limited quota is imposed
  • Agents are heteogeneous in either
    • Mileage
    • Catchability
  • Quota market reallocate catches

TAC vs ITQ (mileage)

TAC vs ITQ (catchability)

North-South world

North-South world

  • Both species sell for 10$
  • 90% of the quotas are for red fish
  • Different effects between TAC and ITQ

Location choices

ITQ incentivates geography

Well-Mixed World

TAC Gear

  • Quotas are distributed 90% reds, 10% blue

TAC Efficiency

ITQ Gear

ITQ Efficiency

Policy Optimization

  • Adaptive Model
  • Tunable Regulations
  • Find parameters that result in best adaptation

Policy Optimization - How ?

  • Model as a complicated black-box function
  • Functional maximization
  • Use Bayesian optimizer

Optimal MPA

  • Geographically split world
  • Find the single MPA that maximizes a score

Optimal MPA

\[ \text{Score} = \text{Blue Biomass}_{t=20} \]

Optimal MPA

\[ \text{Score} = \text{Blue Biomass}_{t=20} + \sum_{i=1}^{20} \text{Red Landings}_{t=i}\]

Optimal MPA - Well-mixed

Optimal Quotas

\[ \text{Score} = \text{Blue Biomass}_{t=20} + \sum_{i=1}^{20} \text{Red Landings}_{t=i}\]

  • Geographically split map
  • 300 fishers
  • Very different quota values for TAC and ITQ

Optimal TAC

Optimal ITQ

Well-mixed world?

In a scenario where fishers are unable to respond to incentives the optimal quotas under TACs and ITQs are exactly the same

In a scenario where fishers are unable to respond to incentives the optimal quotas under TACs and ITQs are exactly the same

Pareto Front

Heterogeneous fleets

  • 2 kinds of boat:
    • Small boats
    • Large boats
  • 2 Objectives:
    • Maximize small boat income
    • Maximize efficiency
  • 1 Policy lever:
    • Build MPA

Fairness Front

Right-most solution

Left-most solution

Closed Loop

Threshold Tax

  • Well mixed world
  • Want to incentivate gear change through a landing tax
  • Blue fish worth 3 times red fish

No intervention

Threshold policy

  • Expensive stock gets consumed too rapidly
  • Find tax \(\tau\) on blue landings such that we maximize pre-tax revenue over 20 years: \[ \text{Score} = 10 \times \text{Red Landings} + 30 \times \text{Blue Landings} \]
  • Set tax only if blue biomass is below threshold \(\bar B\)

Example

  • \(\tau = 30\), \(\bar B = 5\text{M}\)

Optimal

  • \(\tau = 22.84\), \(\bar B = 8\text{M}\)

Welfare analysis

PID Taxation

  • Expensive (blue) stock gets consumed too rapidly
  • Geographically separated
  • Update tax smoothly such that every day only about 600 units of blue stock is landed daily
  • Poor man’s quotas
  • Use a PI controller \[ p_{t+1} = a e_t + b \sum_{i=0}^T e_{i} \] \[ e_t = \text{Landings} - 600 \]
  • “Autopilot” policy
  • Parameters matter
  • Noise matter

PID Taxation - demo

PID Taxation - optimal

Policy Discovery

Policy Discovery

  • We have some indicators of the fishery
  • We have some action levers we can pull
  • We don’t know how to map indicators to actions

Policy Discovery - MSE

A better understanding of some of the trade-offs, particularly that between catch and catch variation, can be achieved by ‘real-time gaming’ of the MSE, which involves the decision-makers managing simulated populations where they are provided with the data which would actually be available on an annual basis

Policy Discovery

first step

Policy Discovery

first step

Dynamic programming

  • Given state \(S_t\), you’d like to take actions \(a_t,a_{t+1},\dots\) to maximize \[ \sum_{i=0}^{\infty} R_{t+i}(S_{t+1},a_{t+1})\]
  • By Bellman equation we can solve this recursively \[ V(S_t) = \max_a T(S_t,S_{t+1},a_t) \left( R(S_t,S_{t+1},a_t) + \gamma V(S_{t+1}) \right) \]
  • Works for very simple mathematical models

Reinforcement Learning

  • We don’t have \(T(S,S',a)\) and \(S\) dimension is huge
  • However
    • We can use some indicators \(I\) to approximate \(S\)
    • Even better we can approximate \(V(S)\) as \(\bar V (I)\)
    • Even better we can approximate \(Q(S,a)\) as \(\bar Q(I,a)\)
    • We can play the game many times, start with a policy and observe what it does
    • Slowly modify the policy by targeting: \[ \a = \arg \max Q(I,a) \]
  • Additive approximations \[ Q(I,a) = \sum \beta_i f(I,a) \]

Biomass-based control

  • 300 Fishers
  • Can’t set quotas
  • Can only open/close fishery each month
  • Biomass and time of the year our only indicators.
  • Train it 1000 episodes, \(\gamma = .999\)

Bayesian Controller - Quota

Random Controller

20 years

80 years

Comparisons - 20 years

Method 20 Years 80 Years
Quota - optimized 20 years 412,056 390,581
Biomass controller 352,566 1,058,428
Random controller 398,069 390,678
Anarchy 230,225 202,231

Revenue-based controller

  • Perfect biomass monitoring is impossible
  • Can we create a controller looking only at average profits and distance from port (human dimensions)?
  • Train it 2000 episodes, \(\gamma = .999\)

20 years

80 years

Comparisons

Method 20 Years 80 Years
Quota - optimized 20 years 412,056 390,581
Biomass controller 352,566 1,058,428
Random controller 398,069 390,678
Anarchy 230,225 202,231
Cash-distance controller 326,116 1,001,269

Problems

  • Rough around the edges
  • Often does not converge
  • Opaque result, hard to describe \[ Q(I,a) = \alpha + \sum \beta_i \cos(c \pi I) \]